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Abstract — Wireless sensor networks are composed of dis- 
tributed sensors that can be used for signal detection or 
classification. The likelihood functions of the hypotheses are 
often not known in advance, and decision rules have to be 
learned via supervised learning. A specific such algorithm is 
Fisher discriminant analysis (FDA), the classification accuracy 
of which has been previously studied in the context of wireless 
sensor networks. Previous work, however, does not take into 
account the communication protocol or battery lifetime of the 
sensor networks; in this paper we extend the existing studies by 
proposing a model that captures the relationship between battery 
lifetime and classification accuracy. In order to do so we combine 
the FDA with a model that captures the dynamics of the Carrier- 
Sense Multiple-Access (CSMA) algorithm, the random-access 
algorithm used to regulate communications in sensor networks. 
This allows us to study the interaction between the classification 
accuracy, battery lifetime and effort put towards learning, as well 
as the impact of the back-off rates of CSMA on the accuracy. 
We characterize the tradeoff between the length of the training 
stage and accuracy, and show that accuracy is non-monotone in 
the back-off rate due to changes in the training sample size and 
overfltting. 

I. Introduction 

Wireless sensor networks are used for detection or classifi- 
cation, whether for surveillance, environmental monitoring, or 
any of the myriad other application domains that are emerging 
in the age of big data. In many such applications, the likelihood 
functions of the hypotheses, e.g., the presence or absence of 
a particular physical phenomenon, are not known before the 
sensor network is deployed; in these applications, the sensor 
network requires training prior to operation via supervised 
learning Q~), EL EL E), 0- The resulting classification 
accuracy improves with the number of measurements taken 
during training [6 |, but increasing length of the training stage 
further reduces the limited battery capacity for the operational 
stage. Therefore, the amount of resources expended during 
training mediates operational lifetime and accuracy of the 
sensor network. 

The energy consumption of sensor nodes, and thus the 
lifetime of the network, is dominated by energy expended on 
communication. Node transmissions in wireless sensor net- 
works are commonly regulated by the Carrier-Sense Multiple- 
Access (CSMA) algorithm Q, J8), (9), EO). This algorithm 
is implemented in TinyOS, a popular open source operating 



system for wireless sensor networks, and is part of the IEEE 
802.15.4 standard for wireless sensor network communication 
ifTTI . Nodes using CSMA access the medium in a distributed 
manner, and wait some random back-off time between succes- 
sive transmissions. 

In this paper we consider a scenario where a set of mea- 
surements and classification is required every time unit. Only 
nodes that are active at that time perform a measurement and 
transmit the result, so the number of measurements collected 
varies over time. We develop and analyze a model of sensor 
networks that perform supervised classification in situ, using 
the Fisher discriminant analysis (FDA) learning algorithm, 
with a training stage and an operational stage enabled by 
CSMA. The specific analysis of focus is the relationship 
between operational accuracy and lifetime, which we show to 
be of a fundamentally different character than for the case of 
detection with known likelihood functions, due to overfltting. 
In characterizing operational classification accuracy (in con- 
trast to classification accuracy on training samples), we make 
use of generalization approximations for FDA developed by 
Raudys et al. fl2l . 

Battery capacity is characterized by the number of transmis- 
sions (and thus measurements) that can be performed, whether 
they be during training or operation. As every measurement 
corresponds to one transmission, the expected network lifetime 
is inversely proportional to the node throughput in our model. 
The performance measures of interest are the classification 
accuracy and operational lifetime, which is the lifetime spent 
in the operational stage, not in the training stage. The two 
main parameters available for configuring the sensor network 
are the CSMA back-off rates (the reciprocal of the mean back- 
off time), and the fraction of the lifetime spent in the training 
stage. 

As the back-off rates of the nodes increase, states with many 
actively transmitting nodes are more likely. This requires more 
energy consumption, and also affect classification accuracy. 
Classification accuracy is not monotonically increasing in the 
number of active nodes due to the phenomenon of overfltting, 
as we discuss in |6|. We also show that operational accuracy as 
a function of back-off rate exhibits the hallmarks of overfltting 
in one regime, but in another regime, has a behavior quite 
different than any behavior usually encountered in statistical 



learning |fl3ll . 

The analysis of supervised classification for sensor net- 
works in the researcg literature is limited [2]: Investigations 
have been predominantly concerned with the detection case 
where the likelihood functions are known. Moreover, sensor 
network research tends to separate learning issues from the 
communication aspect. There are several works that model 
CSMA communication in sensor networks generally, e.g. Ifl4l 
and references therein, but not with the supervised classi- 
fication application as part of the formulation. Cross-layer 
work that does consider the networking issues together with 
a detection or estimation application, e.g., the correlation- 
based collaborative MAC protocol [15], is again focused 
on the case with known likelihoods. So although FDA and 
the performance of CSMA-like algorithms has been widely 
studied in the research literature, we are the first to jointly 
consider classification accuracy and communication aspects of 
wireless sensor networks. 

We consider both the case of statistically independent and 
identically distributed (i.i.d.) measurements from different 
nodes, and the case of measurements exhibiting correlation 
that depends on the spatial distance between the nodes. Hav- 
ing i.i.d. measurements is a common simplifying assumption 
in wireless sensor network detection lfl6l . A model with 
spatially-correlated measurements is much closer to reality 
in most applications ifTTl . fl8l . We assume that the learning 
algorithm has no prior information on the distribution and 
correlation of the measurements; the FDA has to estimate 
means and covariances as part of the training stage. The 
spatial correlation is encoded via a Gauss-Markov random 
field (GMRF) model. 

The CSMA model under consideration was first introduced 
in the 1980s in the context of packet radio networks 1 19 1, [ 20 1 
and was later applied to networks based on the IEEE 802.11 
standard ED, E3, E3, flU. More recently, it has been used 
to study so-called adaptive CSMA algorithms, where the back- 
off rate of the nodes changes with their congestion level l25ll . 
l26l . ll27ll . [28] • Although the representation of binary expo- 
nential back-off mechanism in the above-mentioned models is 
far less detailed than in the landmark work of Bianchi |29| 
and similar results focusing on sensor networks, e.g., |14|, 
1 30 1, the general interference graph offers greater versatility 
and covers a broad range of topologies. 

The remainder of the paper is organized as follows. In 
Section [TH we describe the setup of the sensor network system 
from the FDA supervised classification perspective and in 
SectionUnl we describe the setup of the sensor network system 
from the CSMA communication perspective. In Section [TV] 
we derive the relationship between operational lifetime and 
accuracy, and Section [V] presents numerical results of lifetime 
and accuracy for two special cases, illustrating the complicated 
balancing act that is involved. Section [VTlprovides a discussion 
and several ideas for future directions of research. 



II. Fisher Discriminant Analysis 

Consider a sensor network consisting of n sensor nodes 
each taking a scalar measurement combined into a joint 
measurement vector Xj € R". In the general super- 
vised classification problem, we are given m sample pairs 
{(xi, yi), . . . , (x m , y m )} known as the training set, with mea- 
surement Xj and the class label or hypothesis yj £ {0, 1}. The 
training samples are acquired by the network after deployment 
and before the operational stage. The availability of labels for 
the training measurements is an assumption made in Q], G) , 
[3 1, |4| as well. Once the training set is acquired, the samples 
are used to learn a classification function or decision rule y(-) 
that will accurately classify new unseen and unlabeled samples 
x from the same distribution from which the training set was 
drawn. 

In this paper, we focus on a simple, classical decision rule 
y, the Fisher discriminant analysis classifier l3ll . l32l : 

y(x) =step(w T x + 0), (1) 

where 

w=(e + £i) Oi-Ao), 
= -iw T (£ o +£i), (2) 

and fi Q , So, and Si are the conditional sample means and 
covariances of the m training samples. The Fisher discriminant 
analysis rule is a plug-in classifier that follows from the likeli- 
hood ratio test for optimal signal detection between Gaussian 
signals with the same covariance and different means. The rule 
(HJ is applied in the operational stage of the sensor network 
to classify new observations. 

Given the FDA decision rule (fl}, we would like to charac- 
terize its performance, specifically its classification accuracy 
as it generalizes to new unseen samples in the operational 
stage. Generalization accuracy, however, is a functional of 
the underlying data distribution / x , y (x, y), and we must first 
specify a probability distribution of the sensor measurements. 
We employ the same GMRF statistical model for sensor 
measurements as 0, 1331 . That is, the n sensor nodes are 
deployed on the plane with spatial locations Vj € M. 2 , i = 
1, . . . , n. The likelihoods of the two hypotheses are Gaussian: 
/x| y (x|y = 0) ~ M(» , S) and / x , y (x|y = 1) ~ A^, S). 
The prior probabilities of the hypotheses are equal: Pr[y = 
0] = Pr[y = 1] = 1/2. For simplicity of exposition fi = 
(the vector of all zeroes) and fi 1 — 1 (the vector of all ones). 

The covariance structure is based on the Euclidean nearest 
neighbor graph of the sensors: The (undirected) nearest neigh- 
bor graph contains an edge between sensor i and sensor i' if 
sensor i is the nearest neighbor of sensor i' or if sensor i' 
is the nearest neighbor of sensor i. The set of edges in the 
nearest neighbor graph is denoted £. The diagonal elements 
of S are all equal to a 2 . The elements of S corresponding to 
edges in the nearest neighbor graph are: 

{£}«, =a 2 g(d(v i ,v i ,)), G £, (3) 



where g(-) : M + — > (0, 1) is a decreasing function that encodes 
correlation decay with distance. The inverse covariance matrix 
J = S _1 is used to specify the remaining elements. The off- 
diagonal elements of J corresponding to sensor pairs 
that do not have an edge in the nearest neighbor graph are 
zero, i.e. 

{J} w = 0, i^i', (*,*') (4) 

We also consider the case of i.i.d. observations in the paper, 
in which case g(d) — 1 for d = and g(d) = otherwise. 

A highly accurate approximation of generalization accuracy 
A = Pr[y(x) = y] for the FDA decision rule as described 
above is found in (6]. Based on |12|, this approximation is 
given as 
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m > n, (5) 



where <&(•) is the Gaussian cumulative distribution function 
and 5 is known as the Mahalanobis distance 



^2 ~ ^2 \ 
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g(d(v I; v t Q) 
+ g(d(vi,Vi>))' 



(6) 



In case m < n there are insufficient training samples for 
accurate classification and we have A = 0.5. In the i.i.d. case, 
5 2 simplifies to 

III. Carrier-Sense Multiple- Access 

The CSMA algorithm is an example of a random-access 
algorithm, where nodes decide for themselves when to trans- 
mit, based on local information only. We assume that the n 
nodes share the wireless medium according to a CSMA-type 
protocol. 

The network is described by an undirected conflict graph 
(V, £), where the set of vertices V = {1, . . . ,n} represents the 
nodes of the network and the set of edges £ C V x V indicates 
which pairs of nodes cannot activate simultaneously. For ease 
of presentation we assume that the conflict graph is the same 
as the nearest neighbor graph introduced in Section []]] Nodes 
that are neighbors in the conflict graph are prevented from 
simultaneous activity by the carrier-sensing mechanism. An 
inactive node is said to be blocked whenever any of its 
neighbors is active, and unblocked otherwise. 

The transmission times of node i are independent and 
exponentially distributed with unit mean. When node i is 
blocked it remains silent until all its neighbors are inactive, at 
which point it tries to activate after an exponentially distributed 
back-off time with mean 

The set £1 of all feasible joint activity states of the network 
in this case corresponds to the incidence vectors of all indepen- 
dent sets of the conflict graph. Let the network state at time t 
be denoted by Y(t) = (Yi(t),Y 2 (t), . . . ,Y n (t)) € Cl, with 
Yi(t) indicating whether node i is active at time t (Yi(t) = 1) 
or not (Yi(t) = 0). Then {Y(t)} t >o is a Markov process which 



is fully specified by the state space ft and the transition rates 

if u>' = U) + Ci G fi, 
1, if u' = u> - ei G ft, (7) 
0, otherwise. 

Here e, denotes the vector of length n with all zeroes except 
for a 1 at position i. 

Since Y(t) is reversible (see [19]), the following product- 
form stationary distribution tt exists: 

^r^niuc, if -en, 

v ' 10, otherwise, 



where 



(9) 



uieii i=i 



is the normalization constant that makes tt a probability 
measure. 

The rate 6i at which sensor node i makes observations (or, 
alternatively, the rate at which it does transmissions) is referred 
to as the throughput of this node, and may be written as 
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ir(u)I {u 



=i}- 



(10) 



Sensor nodes rely on batteries for energy, and we assume that 
all nodes have a battery that allows them to make I transmis- 
sions each before their battery is drained. Consequently, the 
expected lifetime of a node can be written as 



Ti . < 



(11) 



The activity process in the training stage is the same as in 
the operational stage. We denote by < a < 1 the fraction 
of the battery capacity that is dedicated to training the sensor 
network. So the testing lifetime of node i is aTi, and the 
operational lifetime (the quantity we would like to be large) 
is: 



Ui = (l-a)Ti = (l-a)l 



(12) 



The model we have specified is fully general for any n-node 
conflict graph. We work with this general model throughout 
the remainder of the paper, but also focus on two illustrative 
special cases. The two special cases of the CSMA network 
we consider are an 7i-node network where all networks are 
disjoint and a three-node linear network. 

A. Independent Nodes 

First, consider an n-node network where all nodes can 
be active simultaneously. This corresponds to an interference 
graph with an empty edge set E = <j>. We have fi = {0, 1}™ 
and set Vi = v so the stationary distribution ([8]l simplifies to 

tt(w) = r -^— z>ll\ (13) 
[y + 1)" 

The stationary probability of any particular state only depends 
on the number of active nodes in that state and on the back-off 
rate v. Thus, for notational convenience, we introduce ir(k) as 



the stationary probability of being in any state with k active 
nodes, and we write 



ir(k) 



1 



(14) 



0+ 1)" \k y 

which follows since there are (™) different activity states with 
k nodes transmitting. 

With equal back-off rates and disjoint nodes, the stationary 
throughput ( [Tot is the same for all nodes 

e l = e = ^—. (15) 

v + 1 

Moreover, all nodes have the same lifetime, and the operational 
lifetime of the network may be written as 

.v + l 



Ui = U=(l- a)l- 



B. A Three-Node Linear Network 



(16) 



Consider the three-node network where the nodes are posi- 
tioned such that the carrier-sensing mechanism prevents node 2 
from activating while either node 1 or node 3 is active. Nodes 
1 and 3 can be active simultaneously, but their observations 
are correlated. The network can take five possible states 

O = {0, ei,e 2 ,e 3 ,ei +e 3 }. (17) 

Using ([H) we compute the following stationary probabilities: 

7T(0)=Z-\ 

7r(ej) = Z~ x Vi, i = 1,2,3, 
7r(ei + e 3 ) = Z~ x v x v z . (18) 

In order to make sure that all nodes have the same through- 
put and lifetime, we fix some parameter rj > and choose 
v\ = l>3 = f] and V2 = r](r) + 1). So node 2 has a 
shorter mean back-off time in order to compensate for its 
disadvantageous position in the network, and all nodes have 
throughput (see |24|.) 



and operational lifetime 

Ui = U=(l- a)l 



2r7 + l 
277 + 1 



n 



(19) 



(20) 



The normalization constant with these back-off rates is given 
by 



Z = Ivf + 3rj + 1. 



(21) 



IV. Relationship Between Lifetime and Accuracy 

We are now in position to combine the FDA model from 
Section UD and the CSMA model presented in Section [Hi] to 
derive the relationship between generalization accuracy and 
operational lifetime. This is mediated by two parameters: the 
back-off rate v or r\ and the fraction of the lifetime spent in 
the training stage a. 

Due to the interference constraints and the intermittent 
nature of CSMA communications, not all nodes produce 
and validly communicate measurements at all times. So the 



training samples are acquired under different activity states 
iv E ft. Thus studying the relationship between accuracy and 
lifetime is not simply a matter of joining the corresponding 
expressions (0 and (fT2l . 

This issue of incomplete data due to the activity process 
can be addressed in several ways, including data imputation 
[ 34 1 . Although various elaborate schemes are available, they 
come at the cost of additional computation, communication, 
and coordination that are at a premium in the sensor network 
setting. Instead, we choose to model the classification by hav- 
ing separately learned classifiers for different activity states. 
In the operational stage the appropriate classifier is used for 
prediction based on the activity state of the measurements. In 
this setup, we associate with each state u> a number of training 
samples 

m u = a! '7t(uj). (22) 

Then we compute the overall generalization accuracy as the 
weighted sum of the individual generalization accuracies for 
each pattern according to their stationary probabilities: 



A 



11) u 



(23) 

with 7r the stationary distribution ^ and as in (f22j). 

We now compute the generalization accuracies for the two 
special cases introduced in Section [ill] with the GMRF of the 
measurements having the same graph structure as the CSMA 
network. 

A. Independent Nodes 

As discussed in Section [Til] for a set of n disjoint nodes, 
all patterns with k active nodes have the same stationary 
probability 7r(fc) given in (Tl~4~b . and all nodes have equal 
throughput ( TTBT l and lifetime ( fTol l. We denote by rrik the 
number of training samples for patterns with k active nodes, 
and by summing d22"i i over all states with k active nodes, we 
write 

,,fc-i 



m k = al 



k) {v + l) r 



(24) 



As discussed in Section [II] with i.i.d. measurements from n 
sensors, the squared Mahalanobis distance is Thus, with 
k active sensors, the squared Mahalanobis distance is 
Substituting the expression for the stationary distribution dT4-b 
and the number of training samples d24l i into the expression 
for the generalization accuracy ( 1231 we obtain 



A 



1 / n\ k ^ 

v + 1)™ \k) 1 ' 



4cr2 



m k 



(25) 



B. A Three-Node Linear Network 

Recall from Section IIII-BI that the three-node network has 
5 feasible states. The four non-empty states have squared 
Mahalanobis distance 



1 = h' 



i = 1,2,3, 



1 



(26) 



ff(tf(vi,v 2 ))s(d(v2,v 3 )) 
Note that since g < 1, the Mahalanobis distance of the larger 
state is larger than that of the states with only one node active, 
and is more valuable. 

Evaluating (f23 we obtain an expression for the number of 
training samples for each state: 

mo = al — 

f] 1 + 17 

= al r, i = 1, 3, 



1' 



m e2 = al, 
m ei+e3 = ai- 
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.. (27) 

7] + 1 

By weighting the individual generalization accuracies d23l >, 
we obtain 
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(28) 
V. Examples 

In Section [TV] we derived the operational lifetime U, the 
number of training samples and the operational classifi- 
cation accuracy A for a wireless sensor network with random- 
access communication as a function of the back-off rate v 
and the fraction of the lifetime spent in training a. Here 
we numerically evaluate these quantities for the special cases 
of independent nodes and the three-node linear network. We 
include a comparison to the Bayes optimal detector with 
known likelihood functions and see that the accuracy behavior 
is markedly different. Additionally, we see that there are two 
different regimes in the accuracy behavior as a function of the 
back-off rate, the second regime different than that usually 
seen in statistical learning. The overall behavior is unique due 
to the combination of CSMA and FDA. 
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Fig. 1 . Operational lifetime as a function of back-off rate. 




Fig. 2. Expected number of active sensors as a function of back-off rate. 



A. Independent Nodes 

We consider a network of n — 8 independent nodes with 
/ = 100 transmissions allowed by the battery per node. The 
sensor measurement noise variance is set to a 1 = 1. Other 
parameter settings produce qualitatively similar results. First, 
in Fig. [T| we plot the operational lifetime U as a function 
of the back-off rate v for a fixed lifetime fraction devoted 
to training: a = 0.2. The operational lifetime is very high 
with low back-off rate because the system is mostly in the 
state with k = active sensors, which does not drain sensor 
batteries at all. Once states with more active sensors become 
more probable with increasing back-off rate, the lifetime drops 
rapidly to U = (1— a)l, the lifetime in the case where all nodes 
are always active. Fig. [2] shows the expected number of active 
sensors k as a function of v. 

One of the components of the generalization accuracy 
expression ( f25l ) is the number of active sensors k; the other is 
TOfc, the number of training samples. In Fig. [3] we set a = 0.2, 
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Fig. 3. Expected number of training samples per state classifier as a function 
of back-off rate. 
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Fig. 4. Operational classification accuracy as a function of back-off rate. 

and plot fhk, the weighted average over k of ink- 

n 

fh k = y"V(fc)m fc , (29) 

k=0 

Interestingly, this number is not monotonically decreasing as a 
function of v like we see with the operational lifetime. This is 
because when several different states all have non-negligible 
probability, the acquired training samples get divided to all of 
the different states. Initially the number of training samples is 
very high because almost all of the training samples are for 
the state with no active sensors. For large v the number of 
training samples approaches al. 

Now that we have looked at k and fhk, we now examine 
the accuracy A as a function of v, plotted in Fig. [4] for 
a = 0.2. The figure also shows the detection accuracy of the 
Bayes optimal decision rule with known likelihood functions. 
The Bayes optimal accuracy is monotonically increasing in v, 
following the expected value of k. On the other hand, the FDA 
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Fig. 5. Operational classification accuracy of the different state classifiers 
as a function of back-off rate. 



classification accuracy first increases in is, starts decreasing 
with local bumps, and then increases. The local bumps arise 
from the generalization accuracy behavior for different states, 
which are shown in Fig. [5] Specifically, the figure shows 
the different <£>(•) components of A given in d25b : these are 
functions of v because the nik are. 

The phenomenon of overrating, demonstrated in [6|, is that 
for a fixed number of training samples and an increasing num- 
ber of sensors, the generalization accuracy first increases and 
then decreases. Conversely, for a fixed number of sensors and 
an increasing number of training samples, the generalization 
accuracy monotonically increases. With the wireless sensor 
network with CSMA communication, both of these effects 
intermingle as a function of v because both the number of 
active sensors and the number of training samples changes. 
The initial increase and decrease in A is the manifestation of 
overrating, where the generalization accuracy is best around 
k = 3 and k = 4. For large v, the number of active sensors is 
essentially fixed at k = n (seen in Fig. |2]l and the number 
of training samples increases (seen in Fig. [3j, resulting in 
improving classification accuracy. 

Finally, we examine the relationship between lifetime and 
accuracy in Fig. [6] For comparison, the figure shows the 
relationship for the Bayes optimal decision rule, in which 
there is no lifetime devoted to training, only to operation. All 
curves represent parametric functions of v, and correspond to 
different values of a ranging from 0.1 to 0.9. Different values 
of a contribute to the frontier of the relationship, the parts 
of the curve closest to the Bayes decision rule and closest to 
the top right corner of the plot. At the extreme of random 
guessing, i.e. A = 0.5, lifetime is maximized by not doing 
any training, i.e. a = 0. Small but increasing values of a then 
contribute to the frontier until a point when smaller values 
of a abruptly again become part of the frontier. Very large 
values of a contribute to the frontier only when the very best 
accuracies possible are desired. 
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Fig. 6. Relationship between operational lifetime and operational classifica- 
tion accuracy. 
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Fig. 7. Operational lifetime as a function of back-off rate. 

B. A Three-Node Linear Network 

Having seen quite interesting behaviors for independent 
nodes, we now turn to a three-node linear network with 
correlated measurements and conflict graph preventing sensors 
1 and 2, and sensors 2 and 3 from transmitting simultaneously. 
We present similar plots as in Section [V-AI with Z = 10 and 
cr 2 = 1. We present results for g(d(vi, V2)) = g(d(v2, V3)) = 
i as the distance-based correlations. For the dependent linear 
network case, we see more or less the same behavior as for the 
independent nodes in Fig.r7l-Fig.fT2l Fig. [7] Fig. [9] and Fig.fTTI 
are given for a — 0.4. One difference from the independent 
nodes case is that for the e2 state, ra u is constant and not a 
function of rj. 

VI. Conclusion and Outlook 

In this paper we proposed a model to investigate the 
interaction between generalization accuracy and operational 
lifetime in wireless sensor networks. We demonstrate that this 
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Fig. 8. Expected number of active sensors as a function of back-off rate. 
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Fig. 9. Expected number of training samples per state classifier as a function 
of back-off rate. 
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Fig. 10. Operational classification accuracy as a function of back-off rate. 
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Fig. 12. Relationship between operational lifetime and operational classifi- 
cation accuracy. 



relationship is highly nontrivial, due to the joint effects of 
overfitting, the number of training samples and the changing 
weights of the various states. The two special cases for which 
we provide result plots are qualitatively similar, and changing 
the conflict graph and spatial correlation does not affect the 
general behavior. 

For small increasing back-off rates, the accuracy improves 
until peaking. At intermediate back-off rates, the accuracy gets 
worse due to overfitting, and then improves again for large 
back-off rates due to increasing training samples per state. 
Due to these different regimes of increasing and decreasing 
accuracy in the back-off rate, along with different values 
of the fraction of lifetime to spend in training affecting 
both the operational lifetime and the accuracy, setting the 
parameters v and a to achieve certain target performance is 
not straightforward. The parameterized curves in Fig. [6] and 
Fig. Q~2] give (not necessarily intuitive) recommendations for 



balancing lifetime and accuracy as a function of the back-off 
rate and training fraction parameters. 

The classification and communication models we have used, 
i.e. FDA with GMRF-dependent sensor measurements and 
binary exponential back-off mechanism, are certainly simpli- 
fied, but are general and amenable to analysis. The guidelines 
and behaviors we see will transfer over in a broad sense to 
other classifiers and other similar random-access communica- 
tion protocols. The accuracy behavior that we see is partly 
due to the way we deal with measurements from different 
states through separate classifiers, but the general complicated 
behavior ought to remain if we take another approach. 

A. Outlook 

Having made the connection between lifetime and accuracy 
in Fig. [6] and Fig. Q~2] the next step is to find the values of a 
and Vi that achieve a certain target performance. For example, 
we may want to maximize the lifetime of the network, subject 
to certain accuracy constraints (3 g (0, 1): 

(a* , v*) = argmax[/(o;, vi, . . . ,v n ) 

s.t.A(a,v 1 ,...,v n )>0. (30) 

The optimization problem (l30l > is non-convex (as illustrated in 
Fig. [6}, and we may approximate its solution using numerical 
methods. Some preliminary results are shown in Figs. [13] 
and Q3] where plot the solution to ( f30b for increasing /3, 
in the model with n independent nodes with the parameters 
as in Section [V] Fig. [13] shows that the v increases almost 
monotonically, and jumps to infinity around f3 = 0.772. 
In practice we see that the back-off rate is constrained by 
physical limitations and by the communication protocol, so 
v is bounded from above. Fig. [14] shows a more irregular 
behavior for a, with a sharp drop when v jumps to infinity. 

The non-monotonicity of the classification accuracy in the 
back-off rates makes an analytic approach to optimization 
difficult, and an alternative solution would be to approximate 
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Fig. 14. Plotting a* as a function of the desired accuracy. 

the expression for the detection accuracy d23l with some con- 
vex function. This would reduce the complexity of numerical 
optimization, and may even allow for analytical results. 

The effect of overfitting for medium back-off rates can be 
mitigated by choosing different back-off rates of the training 
stage and operational stage. For example, choosing larger 
back-off rates during training should increase the number of 
samples for states with many active nodes, thus reducing the 
risk of overfitting. Although this would simultaneously reduce 
the number of samples for smaller states, the risk of overfitting 
is not as high there due to the smaller number of active nodes. 

Another direction for future research is to model temporal 
correlation in the sensor measurements in addition to spatial 
correlation. In the present work, successive measurements 
in time are assumed independent, but including temporal 
correlation is more realistic ifTTl . If temporal correlation is part 
of the sensing and classification model, its interaction with the 
temporal back-off mechanism may produce quite interesting 
phenomena. A Markov model for temporal correlation could 
be analyzed together with the Markov activity process of the 
CSMA model. 

Finally, we also mention that asymptotic analysis is of 
interest in the future study of this cross-layer supervised 
learning and random-access communication setup. Developing 
expressions for the three-node dependent network, e.g., (|27] > 
and d28l >. requires us to keep track of many details; larger 
networks will require us to keep track of many more. By 
performing an asymptotic analysis of an increasing number of 
randomly placed sensor nodes with constant density in |6), we 
are able to eliminate many such details in the sensor network 
generalization error using geometric probability [35 1. Having 
now set forth this extended model with CSMA communica- 
tion, similar asymptotic analysis using geometric probability 
is certainly warranted. 
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